High-precision Identification of Discourse New and Unique Noun Phrases

نویسنده

  • Olga Uryupina
چکیده

Coreference resolution systems usually attempt to find a suitable antecedent for (almost) every noun phrase. Recent studies, however, show that many definite NPs are not anaphoric. The same claim, obviously, holds for the indefinites as well. In this study we try to learn automatically two classifications, and , relevant for this problem. We use a small training corpus (MUC-7), but also acquire some data from the Internet. Combining our classifiers sequentially, we achieve 88.9% precision and 84.6% recall for discourse new entities. We expect our classifiers to provide a good prefiltering for coreference resolution systems, improving both their speed and performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Discourse Structuring Potential of Definite Noun Phrases in Natural Discourse

This paper investigates an alternation found with definite noun phrases in direct object position in Romanian that represents a theoretical puzzle for current theories of Differential Object Marking or pe-marking (Dobrovie-Sorin 1994). When in direct object position and unmodified, definite noun phrases can be accompanied either by the differential object marker pe, or by the simple enclitic de...

متن کامل

The Discourse Structuring Potential of Definite Noun Phrases in Romanian

This paper investigates an alternation found with definite noun phrases in direct object position in Romanian that represents a theoretical puzzle for current theories of Differential Object Marking in this language (Gramatica Limbii Române 2005, Klein & de Swart 2011). When in direct object position and unmodified, definite noun phrases can be accompanied either by the differential object mark...

متن کامل

Corpus - Based Identi cation of Non - Anaphoric NounPhrasesDavid

Coreference resolution involves nding antecedents for anaphoric discourse entities, such as deenite noun phrases. But many deenite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., \the White House" or \the news media"). We have developed a corpus-based algorithm for automatically identifying deenite noun phrases that are non-anaphoric, w...

متن کامل

Corpus-Based Identification of Non-Anaphoric Noun Phrases

Coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., "the White House" or "the news media"). We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphor...

متن کامل

Extracting noun phrases for all of MEDLINE

A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identificati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003